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Abstract: In this paper we investigate the performance of periodogram 
based estimators of the spectral density matrix of possibly high-dimensional 
time series. We suggest and study shrinkage as a remedy against numerical 
instabilities due to deteriorating condition numbers of (kernel) smoothed 
periodogram matrices. Moreover, shrinking the empirical eigenvalues in the 
frequency domain towards one another also improves at the same time the 
Mean Squared Error (MSE) of these widely used nonparametric spectral 
estimators. Compared to some existing time domain approaches, restricted 
to i.i.d. data, in the frequency domain it is necessary to take the size of 
the smoothing span as "effective or local sample size" into account. While 
Bohm and von Sachs (2007) proposes a multiple of the identity matrix as 
optimal shrinkage target in the absence of knowledge about the multidi- 
mensional structure of the data, here we consider "structural" shrinkage. 
We assume that the spectral structure of the data is induced by underlying 
factors. However, in contrast to actual factor modelling suffering from the 
need to choose the number of factors, we suggest a model-free approach. 
Our final estimator is the asymptotically MSE-optimal linear combination 
of the smoothed periodogram and the parametric estimator based on an un- 
dcrfitting (and hence deliberately misspecificd) factor model. We complete 
our theoretical considerations by some extensive simulation studies. In the 
situation of data generated from a higher-order factor model, we compare all 
four types of involved estimators (including the one of Bohm and von Sachs 
(2007)). 

Received April 2008. 

1. Introduction 

Spectral analysis of multivariate time series is known to be a useful tool to anal- 
yse not only serial but also cross-correlations of dynamic data of possibly high 
dimension (Shumway and Staffer, 2000). In the absence of some possibly restric- 
tive parametric assumptions on the dynamics of the time series (such as vector 
autorcgrcssive - moving average of finite order) , the standard nonparametric ap- 
proach of smoothing the periodogram matrix over frequency usually shares well- 
established and generally even for moderate sample sizes satisfactory properties 
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by the "Academie universitaire Louvain" . 
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such as approximate unbiasedncss, approximate uncorrelatedncss over different 
frequencies and the usual variance-bias trade off known from classical nonpara- 
metric theory (Brillingcr (1975)). What is less known and explored, however, 
and highly relevant for more and more frequently met situations of large dimen- 
sionality of the time series, is the deterioration of the condition number of the 
resulting nonparametric estimator (smoothed periodogram matrix). It is known 
that a high condition number of such a matrix, i.e. the ratio ^maxAmin of its 
largest to its smallest eigenvalue, leads to numerical instabilities, in particular 
when the (estimated) spectral density matrix is used subsequently in sensitive 
functionals such as its inverse or its determinant. A prominent example for 
the latter ones is the use of the Kullback-Leibler discrimination information 
(Kullback and Lciblcr, 1952), as a measure of disparity between several esti- 
mated multivariate spectra (as in Kakizawa, Shumway and Taniguchi (1998), 
e.g.), to be used in classification of multivariate time series. 

In many fields of application, including economic panel data (Bai and Ng, 
2002; Form, Hallin, Lippi and Rcichlin, 2000), but also genetic engineering or 
neuropsychology, the dimension of the data can come close to the sample size, 
making the smoothed periodogram become close to a singular matrix, in par- 
ticular. 

In this paper we suggest a remedy to improve upon the smoothed peri- 
odogram as an estimator for the multivariate spectrum using regularization, i.e. 
shrinkage, techniques. It is known from the statistical literature on estimation 
in i.i.d. data situations (Haff, 1977, 1979, 1980), that shrinkage helps to correct 
the following effect: the dispersion of the sample eigenvalues can be tremen- 
dously larger than the dispersion of the population eigenvalues of the spectrum 
as the large eigenvalues are biased upwards, the small ones downwards (Jolliffe, 
2002). Thus, the quality of an estimator of a high-dimensional target can be 
improved, by shrinking the eigenvalues towards one another, not only numer- 
ically, but even on the level of the widely used criterion of mean square error 
(Beran and Diimbgcn, 1998; Lcdoit and Wolf, 2004). 

We note that this technique is more than just standardizing each dimension 
of the time series - which would improve the condition number in case of min- 
imally coherent data, but not so with potentially highly cross-correlated data 
(the interdependence over dimension being responsable for the afore-mentioned 
dispersion effect). 

Compared to existing work on shrinkage in the time domain, we show that in 
the frequency domain it is necessary to take the size of the smoothing span m as 
"effective or local sample size" into account. We note that simply choosing the 
smoothing span of the smoothed periodogram sufficiently large is no reasonable 
solution to the problem: depending on the roughness of the true spectral density 
to be estimated, this might result into important oversmoothing. 

For reasons of notational simplicity, in this work, we consider as simplest 
smoothing method the averaged periodogram, that is a symmetric kernel 
smoother of finite support ( "boxcar" ) with equal weights for each periodogram 
ordinate within the smoothing span. One can easily check that all the results of 
our paper carry directly over to the more frequently used kernels in the smooth- 
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ing literature. 

Our proposed shrinkage estimator is, pointwise at frequency uj € (0,27r], 
a convex combination of the averaged periodogram /j-(w) with some shrink- 
age target /^(w) in the frequency domain. I.e., our estimators are of the form 
/t(w) := r<r(tj)/y((-L;) + (1 — rr(ti>))/y(w) , where in order to reduce the disper- 
sion of the eigenvalues of f^(w), the factor rr is chosen such that the sample 
eigenvalues are shrunk towards each other linearly. The most direct target to use 
would be (a multiple of) the identity matrix, i.e. f^(w) = Id. This set-up 
has been treated by the authors in a companion paper (Bohm and von Sachs, 
2007), where they determine the optimal amount of shrinkage by a data driven 
approach in a framework of an asymptotically with sample size growing dimen- 
sion. 

Obviously, the technique of shrinkage has a certain relationship to ridge re- 
gression. In fact, a linear combination of a sample covariance and the identity 
matrix has been used as original motivation for ridge regression (Hocrl and Kennard, 
1970). However, shrinkage of empirical covariance matrices or spectral density 
estimators is not the same thing as ridge estimation. In the first approach, 
the eigenvalues of the matrices under consideration are shrunken towards each 
other, and hence their dispersion is reduced. Constructing a ridge, all eigenval- 
ues are moved away from zero (either by the same amount for ordinary ridge 
or by some individual constant for generalized ridge regression) in order to 
regularize the estimator. Recent theoretical work of Bickcl and Lcvina (2007), 
Rothman, Levina and Zhu (2008), e.g., using a lot more refined techniques such 
as Lasso, or thresholding for regularization of large-dimensional covariance ma- 
trices, are in this latter spirit. 

Using the identity matrix as a shrinkage target is reasonable if there is little 
or no knowledge about the underlying multidimensional structure of the data. 
In this case, a shrinkage target should be used that imposes the least possible 
amount of structure and which, at the same time, has the best of all possi- 
ble condition numbers. In many settings, however, it is reasonable to assume 
that the covariance or spectral structure of the data is induced by underlying, 
known or hidden, factors. The general idea underlying factor models is that p 
observed random variables can be expressed, except for an error term, as linear 
functions of q < p random factors. For instance, in econometrics, markets are 
usually assumed to be driven by underlying global variables such as interest 
rate, employment rate or gross national product. The models reach from simple 
one-factor models, as in Sharpe (1963), to sophisticated approaches that use 
multiple global and industry specific factors that may be intercorrelated, as, 
e.g., in Form ct al. (2000). 

A disadvantage of factor models is that, usually, the number of factors is 
a parameter that must be either specified a priori or chosen by somewhat so- 
phisticated data-driven procedures akin to model selection. Research on how 
to propose a generally satisfying criterion is still going on (Bai and Ng, 2002; 
Hallin and Liska, 2007), and it would be interesting to avoid this problem while 
taking advantage of the structure imposed by a factor model to be a remedy to 
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the curse of dimensionality. 

We have developed a hybrid approach to circumvent the dilemma of model 
choice and still retain the advantages of factor analysis. We combine a nonpara- 
metric estimator, in our case the averaged periodogram f^(u>), with a parametric 
estimator f^(uj) of the spectral matrix. The latter is our new shrinkage target. 
It is given by fitting a one-factor model to the data. However, we do not as- 
sume that this model is true; rather, we believe that the data follow a more 
complicated structure. This may be a q- factor model (q > 1), a model driven by 
different layers of factors, or the model may be completely unknown. By com- 
bining a shrinkage target, which is actually underfitted, with a nonparametric 
estimator of the spectrum, we circumvent the problem of model choice. In a 
data driven approach, weights are chosen such that the new, hybrid estimator is 
the asymptotically optimal linear combination of two conventional estimators. 
The first component, the averaged periodogram, is asymptotically unbiased but 
has high variance. The second component is biased due to misspecification but, 
by imposing structure, has low variance. 

We note that, instead of choosing a one-factor model as our shrinkage target, 
we might as well opt for something more complicated, e.g. a q factor model with 
q > 1. The only prerequisite for doing this is having background knowledge that 
the underlying structure is more complicated than the shrinkage target, e.g. a q 
factor model, q > q. The theory we will give in section 3.1 can easily be adapted 
to such a case. 

To the best of the authors' knowledge, there is no literature on shrinkage to a 
factor model in time series analysis. In the literature on finance, an approach to 
shrink to a factor model has been developed in the context of portfolio selection 
(Ledoit and Wolf, 2003) under iid assumptions on the data. However, the idea 
of shrinking a nonparametric fit towards a parametric estimator drives quite 
generally a variety of existing approaches, among which one finds the work of 
Daniels and Crcssic (2001), and to some extent, of Botts and Daniels (2006), in 
the context of Bayesian covariance and spectral estimation, respectively. 

The remainder of this paper is organized as follows: in the next section, we 
will develop the theoretical background for data driven shrinkage to a 'market' 
one-factor model, where the term 'market' is just a wildcard term that does not 
necessarily mean that we are in an economic context. We will first give the basic 
assumptions and definitions in the following subsection. In section 3.2, we will 
introduce the shrinkage target, which is a one-factor model. The model assump- 
tions are that the p dimensional process is driven by a dynamic, hidden or known, 
underlying process with spectral density /o(w). We will fit this model to the data; 
however, at the same time we assume that the model be misspecified. The philos- 
ophy behind this is that the model is just a parsimonious tool of describing the 
data. In sections 3.4 and 3.5 we derive the MSE-optimal solution for the shrink- 
age intensity which is a function of the true spectral density f(to). In section 3.6, 
we will examine the asymptotic behaviour of the MSE-optimal shrinkage inten- 
sity rx(w), which will help us to develop a data driven estimator in section 3.7. 
Comprehensive Monte Carlo studies will show the usefulness of our estimator 
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in section 4. We note that most proofs are relegated to an appendix section. 
2. Multivariate spectral analysis 

We assume that wc observe a realization {X t )J =1 of a p-dimcnsional real- valued, 
centered stationary Gaussian time series (X t ). We aim at estimating the p x p 
spectral density matrix function at frequency cu £ (0, 2n] 

/H = ^£Cov(X t ,X t+u )cxp(-«,,u), we(0,27r] (1) 

where t = \f—\ . The most common nonparametric estimators of (1) are based 
on the periodogram. If we denote by 

1 T 

dr^) = ^=^X t exp(-ujt), ^g(0,2tt] (2) 
V2nT t=1 

the vector- valued discrete Fourier transform of the realization (X t )J =1: then the 
p x p periodogram matrix is defined as 

I t (cj) := d T {oj)d* T {uj) (3) 

where * means conjugate complex transpose . Furthermore, we will denote con- 
jugate complex (for a scalar value) by overline. The periodogram is not a consis- 
tent estimator of the spectrum (1), but it is asymptotically unbiased. Moreover, 
for p > 1, the periodogram is a singular matrix: if dr(uj) = (di(uj) 7 . . . , d p (ui))' , 
then (3) can be expressed as 

/ diH \\ 

d„( W ) ; (4) 

V d p (u) ) ) 

and thus has almost surely rank 1. If the periodogram is smoothed over fre- 
quency the estimators derived this way are consistent under a classical asymp- 
totical framework. Wc will restrict ourselves to the simplest form of smooth- 
ing, the averaged periodogram with smoothing span wit , where the conditions 
wit/T — > and — > oo as T — > oo guarantee consistency and asymptotic 
unbiasedness: 

(m T -l)/2 

/°(u,):= Yl M^ + Wfc). (5) 

fe=-(mr-l)/2 

where uik denotes the Fourier frequency 2irk/T . 



I d x {u) 
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3. Theoretical framework 
3.1. Setup and assumptions 

Our aim is to estimate the spectrum f(uj) of a p-variate Gaussian time series. 
We assume that we have realizations 

P^t)te{i,...,T} = Xn, . . . ,XiT, i = 1, .. .,p 

Moreover, we assume that we have realizations from another, one dimensional 
time series 

{Xot)te{i,...,T} = X i 7 . . . , X t 

to which we refer as the market or exogenous time series. The market time series 
is thought to be a process that has a certain explanatory value for the other 
time series (X it ),i = 1, . . . ,p. One possible choice is to use the average over 
dimension in the time domain of the (Xit)i=i ,p, 

1 P 

Xot = — y Xu 

However, we make no special assumptions on the market time series. It would as 
well be possible to choose an external variable or the first principal component 
of the data. 

We make the following assumptions: 

Assumption 3.1. All our time series, including the market time series, are cen- 
tered 

EX it = i = 0,...,p 

and stationary. 

In this paper, purely for reasons of simplifying the presentation, we do not 
present our estimation results in terms of the spectrum directly, but rather 
choosing the expected periodogram 

#( W ):=E#(w) (6) 

as estimation target. This is possibly without loss of generality because the 
expected periodogram fj~(oj) approaches the true spectrum f(u>) with a rate of 
convergence suffiently fast to enable us to carry over our proofs immediately to 
estimate f(to). In order to do so we make the following assumption: 

Assumption 3.2. If 

7»j(ft) = EX it X jit+h , i,j = 0,...,p, heZ 
denotes the autocovariance function, then 

oo 

]T \h\ \ 7ij (h)\ <oo Wi,j = 0,...,p 

h— — oo 
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Then, we have the following well-known result from Brillinger (1975) or 
Shumway and Staffer (2000): 

Lemma 3.1. Under assumption 3.2, f(u>) has (elementwise and for real- and 
imaginary parts separately) continuous derivatives of order one, and hence 



The enhanced estimator we want to construct is gained by linearly combining 
a standard nonparametric estimator, in our case the averaged periodogram, with 
a shrinkage target. The latter is gained by fitting a one-factor model to the data, 
where the time series Xgt is assumed to be the underlying factor. 

We assume the dimension p to be fixed while still T — ► oo. We denote the 
zth component of the discrete Fourier transform of the data at frequency to as 



We furthermore make the following notational convention: whenever we use 
vector- or matrix valued terms, we will mean the respective p-dimcnsional vec- 
tor or the p x p matrix unless we explicitly state otherwise. Thus, f(u>), /j-(w) 
and f^(u>) refer to the spectrum, expected averaged periodogram and averaged 
periodogram, respectively, of the time series (Xi t )i=i >p - We will also refer to 
the p-dimensional vector of the time series at time t as X t . However, when we 
look at components, we will use the index value zero to refer to the market time 
series. E.g., we refer to the cross-spectrum between the market series and the 
first component of X t as /oi(w). 

3.2. One-factor model 

The shrinkage target is given by fitting a one-factor model to the data (Xn), i = 
1, . . . ,p, which wc will define in this section. We will use a different notation for 
the random variables to emphasize that this model is not assumed to hold true 
for the data Xu- Rather, we use the model as a parsimonious tool to approximate 
the spectral structure of the process. 

Let us assume that we have a univariate exogenous time series -Xot,t = 
1, . . . , T with spectrum fo(u)). When we speak of exogenous, we mean that this 
data Xot can be used as a factor time series that has some explicative value for 
the data in the sense of the following model: 



The weights ft € 1 are non-random. The idiosyncratic components en are 
assumed to be normally distributed and independent over time and dimension, 



/° (w) - /(a,) = O (m T /T) . 



di(u>). 



Xu 



(3iX 0t + 6it i=l,...,p 



(7) 



and independent of (Xot): 




(8) 



In this simple factor model, all serial correlation in the data Xu originates from 
serial correlation in the exogenous time series Xg t . The serial correlation of the 
exogenous time series is determined by its spectrum /o(w). 
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The fact that the idiosyncratic components are uncorrclated over time and 
dimension is important, as in either other case, it would be impossible to identify 
the model under classical asymptotics (Form ct al., 2000). Together with the 
independence between the idiosyncratic components and the exogenous time 
series, this has two more advantages: first, it will allow us to use linear regression 
to estimate the Pi and the (cr| ) 2 . Second, this model implies, simply by linearity, 
the following relationship for the DFTs of the data: 

Lemma 3.2. 

di(w)=AdoH + di(w) (9) 

where d\{ui) is the DFT of the idiosyncratic components. Furthermore, 

d|H~^ c (o,(<7i) 2 ) v, (io) 

We see from (10) that the variance in the idiosyncratic components is inde- 
pendent of frequency. Furthermore, the weights j3 = (Pi, ... , (3 P )' arc indepen- 
dent of the frequency, too, due to (7). This means that the spectrum under the 
above specified one-factor model (7) is 

/V) = /3/3'./oH + A (11) 

where 

/ (a{f ... \ 
A= : : (12) 

V ... (ap 2 J 

When it comes to estimation of the one-factor model, we will as afore-mentioned 
identify the spectrum with the expected averaged periodogram. Thus, instead 
of using the model (11), we will use the slightly modified model 

f l {u)=pp'f°{u)+k, (13) 

where fo(u) means the expected averaged periodogram of the factor time se- 
ries Xot- 



3.3. Estimation of one-factor model 

The model (13) is assumed not to hold true. However, even under these circum- 
stances, it is possible to fit this model to the time series Xu by choosing weights 
Pi such that the L2 distance between f°(w) and PP' fo(uj) becomes minimal. 

We will refer to this minimum L2 distance spectral density under the one- 
factor model as to / 1 (w). 

The fact that both weights Pi and idiosyncratic variances (cr|) 2 are indepen- 
dent of lag and frequency, respectively, enables us to estimate these parameters 
with standard methods. We use linear regression to obtain the following esti- 
mators hi for the PiS 

h (14) 



1 
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which is just the standard estimator of the slope in linear regression. 

Next, we need to estimate the variances (erf) 2 of the idiosyncratic compo- 
nents. The standard way to do this is again to use the time domain estimator 
of the residual variance, which we normalize by l/2n: 

t=l 

Furthermore, both estimators have the convenient property of being consistent, 
and the stochastic rate of convergence is in both cases 1 / y/T (Sachs and Hedderich, 
2006): 

h = A + O p (-±=\ (16) 

and 

W* = (atf + O p (J=^ (17) 
Plugging the estimators from (16) and (17) and the averaged periodogram of 

(m T -l)/2 

/o°M = - E Ioo(u + u k ), 

fe=-(m T -l)/2 

into the definition of the one-factor model (13), we obtain an estimator of the 
multivariate spectrum that is based on a one-factor model: 

fe(cj)=bb'f°(uj) + D , (18) 

where 

D = diag(Vf) 2 ...(^)2) . 

This estimator is our shrinkage target. By construction of the model, with equa- 
tions (14) and (15), we observe that on the diagonal f^(u>) = /j.(w). 

3-4- Optimal shrinkage intensity 

Our aim is to improve upon the averaged periodogram by shrinking to a target 
matrix function that is more regular, at the price of possibly having larger bias. 
Here, we make the assumption that a one-factor model is not far too crude 
an approximation. We do, however, not believe that the underlying structure 
is totally explained; we even make the opposite assumption, namely that the 
model is misspecified: 

Assumption 3.3. There exists a S > such that, uniformly over all frequencies 
oj £ [0, 2ir] and all i, j = 1, . . . ,p, we have 

|4( W )-/P>)|><5 (19) 
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Assumption 3.3 is made for technical reasons: the estimator of the shrinkage 
intensity which we are going to derive will have an estimator of the difference 
fij(v) — fij{to) in the denominator. Because of this, assumption 3.3 is needed 
to avoid problems of identifiability. 

We search for a linear combination 

;+h = ctM/^h + (i - ctM)$h 

where Ct(^) is a data driven estimator of an optimal, oracle shrinkage intensity 
Ct( w ) th & t is t nc solution of the minimization problem 



E #M-.#(fc 



(20) 



that is, 



CJ(w) = arg min (u) E z T (u:)fe(uj) + (1 - z T (w))/£(w) - 



We will proceed in three steps: 

First, in subsection 3.5, we will derive the optimal, oracle shrinkage intensity 
Ct( w ) which depends on background knowledge of the underlying process. 

Second, in subsection 3.6 we will derive the asymptotic behaviour of the 
oracle shrinkage intensity. We will see that the necessity to shrink vanishes 
asymptotically. This is because the averaged periodogram is a consistent esti- 
mator whereas the shrinkage target is misspecified due to assumption 3.3. As 
a consequence, the data driven estimator of will asymptotically have the 

same behaviour as the averaged periodogram, as the data driven estimator of 
the shrinkage intensity will converge to zero. Finally, we will construct a data 
driven estimator in subsection 3.7. 



3.5. Oracle shrinkage intensity 

We will derive the oracle shrinkage intensity by solving the minimization prob- 
lem given in formula (20). This can simply be done by differentiation. Let 
z € [0, 1] denote a shrinkage intensity. The risk R(z) associated with z is derived 
in Appendix A.l: 

R(z) = e||z/^H + (1-z)/°H-/£m|| 2 

= { z2 Var H + 0- - z ) 2 Var fij M ( 21 ) 

+ 2z(1-z)5R(Cov(4H,/«h)) 
+ z 2 |4H-/°.(u;)| 2 ) 

where we have used that E/j.(a;) = /j-(w) 
and, according to (13), E/f(w) = /j-(w). 
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The first derivative of R(z) with respect to z is: 



R'{z) = 2 (*> Var (c) - (1 - z) Var ft. (c) 

+ (1 - 2z)5R (Cov (4(c), /0(c))) + z |4(c) - /0(c) | 



Moreover, the second derivative is 

= 2 f] (Var(4(c)- /0(c)) + |4(c) -/0(c)| 2 ) 

> (22) 

where we use that /^(c) and /y(c) are hermitian, so that the imaginary parts 
sum to zero. 

Thus, we know that any local extremum will be a minimum. Setting the first 
derivative equal to zero, we obtain the following theorem. 

Theorem 3.3. The optimal shrinkage intensity is given by 



EL-=i (W°(c) - 29? Cov (4(c),4 U 

(u>) = - LL (23) 

EL=i (Var(4(c) - 4(c)) + |4(c) - f°(w)\* S 



Proof. The proof is found in A.l. □ 

3.6. Asymptotic behaviour of optimal shrinkage intensity 

Now, we will examine the asymptotic behavior of the optimal shrinkage intensity 
(23). This will enable us to derive a data driven estimator in the following 
subsection. We first define the following parameters: 



4c) = j^TTtfM (24) 

i,i=i 

V 
P 

7 (c) = 5^7«H (26) 

where the subcomponents are defined, respectively, as: 

nij (c) = Asy Var ( VmF/g (c)) = | / y - (c) | 2 (27) 
Pij (u) = AsyCov (V^4(c), Vm?/SH) = /WwM/ioM (28) 
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= |4M-/«H| 2 (29) 
using the notation 

AsyVar(-) := lim Var(-) 

T — >oo 

and with weights Pi defined in equation (7). Now, we can express Ct( w ) as a 
function of (24) to (26) plus a remainder term which converges to zero suffi- 
ciently fast under the following additional assumption: 

Assumption 3.4. The smoothing span hit is supposed to fulfill m T /T — * as 
T — > oo. 

This assumption 3.4 is made for the technical reason of proving the following 
theorem which gives now the exact expression of £ T (u;): 

Theorem 3.4. The optimal shrinkage intensity can be expressed as the following 
function of the parameters tt(u>), p(ui) and 7(u>): 

CtH = J_ *(")- y")) + o (t-W) (30) 
tot 7(w) V / 

Proof. The proof is found in A. 2 □ 

This means that the optimal shrinkage intensity converges to zero at a rate 
of 1/tot- At the same time, it can be approximated by the parameters (24) 
to (26) with an error that vanishes, under assumption 3.4, with the faster rate 
of T -1 / 2 . This will allow us to derive a data driven estimator of the shrinkage 
intensity, and thus of / T (a;), by plugging in estimators for (24) to (26) in (30). 



3.7. Data driven estimation 

The final step in deriving a data driven estimator of the spectrum that combines 
the averaged periodogram with a parsimonious, one-factor model based estima- 
tor, is to derive estimators for the parameters ir(u>),p(u>) and j(u>). We will start 
by estimating 7r(w). According to (24), irij{io) is the asymptotic variance of the 
i,jth component of the averaged periodogram, scaled by the smoothing span 
tot- The following lemma will give a consistent estimator: 

Lemma 3.5. n(u>) is estimated consistently by 

v 

where 

(m T -l)/2 

Py( W ) = _ ]T |/ 4 >+^.)-/°H| 2 (32) 

TOT * — ' 

fc=-(mr-X)/2 

i.e. Pij(u>) is the standard estimator of the local variance of the («, j)th compo- 
nent of the periodogram at frequency lo. 
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Proof. The proof is given in A. 3 □ 

The next step is to estimate p(uj). We will estimate its components and 
distinguish between the components on the diagonal and the components on 
the off-diagonal. As observed earlier, on the diagonal, = /tO^Oj thus we 

can use the estimator (32). On the off-diagonal, we can use the estimator given 
by the following lemma: 

Lemma 3.6. For i ^ j, a consistent estimator of Pij(to) is given by 

r«(w)=W^w)^M (33) 

Proof. The proof is given in A. 4. □ 

The estimator of the last of the three parameters, 7(w), is derived in a 
straightforward way: 

Lemma 3.7. j(u>) is estimated consistently by 

p 

sM=l>«M (34) 

where 

ftjH = |^(w)-4-( w )| 2 ( 35 ) 

Proof. Both f^(u>) and /t( w ) are consistent estimators of fx(u>) and /y(w), 
respectively. □ 

With the help of lemmata 3.5 to 3.7, we can now construct the data driven 
market shrinkage estimator of the spectrum, which is given by the following 
theorem: 

Theorem 3.8. The estimator 

CM = J-*")-f (»•(">) (36 ) 

is a consistent estimator of 

1 7t(cj) - 2jR (p(u;)) 
tot 7( w ) 

Proof. This is implied by assumption 3.3 in conjunction with lemmata 3.5, 3.6 
and 3.7. □ 

Thus, we have finally arrived at a shrinkage estimator that depends on the 
data only, not on background knowledge of the underlying process: 

; _ m - »(rM) y , + L _ PM - «<rMA /?M (37) 
m T g(uj) \ m T g{uj) J 

We will refer to this estimator as to the DDMSE (data driven market shrinkage 
estimator). The following theorem gives the asymptotic behavior of the DDMSE 
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Theorem 3.9. Under assumptions 3.1 to 3.4, is a consistent estimator of 

the spectrum. 

Proof. Asymptotically, the optimal shrinkage intensity Cf( w ) vanishes according 
to theorem 3.4. According to theorem 3.8, Ct( u; ) * s estimated consistently by 
Ct(w). This means that Ct(^) converges to zero, too, and thus that 
converges to the averaged periodogram. □ 

The performance of the DDMSE in practice will be examined by extensive 
Monte Carlo simulations in section 4. 

4. Monte Carlo studies for the DDMSE 

In this section, we will evaluate the performance of the data driven market 
shrinkage estimator in practice. For this, we will perform comprehensive Monte 
Carlo simulations. The DDMSE will have three benchmark estimators to com- 
pete with: 

1. the averaged periodogram 

2. the one-factor model that is the shrinkage target 

3. a competing shrinkage estimator, referred to as DDSSE, that uses the 
identity matrix as the shrinkage target, see Bohm and von Sachs (2007) 

In a setting where it is reasonable to use the DDMSE, it should outperform 
all three benchmarks. Such a setting can be characterized as the frequently 
encountered situation where one may fit a factor model to the data, but has no 
background knowledge on how many factors to actually choose. In a screeplot of 
the eigenvalues, one will typically encounter one or more prominent eigenvalues 
followed by a longer tail of small eigenvalues. The method we have developed 
will allow us to avoid the problem of model choice. 

4 ■ 1 ■ Setup 

For the simulations, we have chosen to use a two-factor model as the true model. 
The first factor is an MA(2) process. Its spectrum has a peak at tt/2. The second 
factor driving the process is a Gaussian white noise time series; its variance will 
be varied in a first simulation study, to examine the performance of the DDMSE 
on the 'scale' between almost one-factor model to true two-factor model. Figure 1 
shows the spectrum of the two factors underlying the simulations. These two 
factors are then projected onto a 5-dimcnsional time series according to the 
following model: 

X t = Tf t + e t , e~AA(0,f!) (38) 

Here, T is a 5 x 2 weight matrix that was chosen at random initially, then fixed 
for this section. The initial random distribution for the components of T was 
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Fig 1: True spectrum of the two underlying factors. The imaginary parts are all 
zero. 



uniform ~ U([.3, 1]), the components being chosen independently. 



T = 



/ .5871 .4510 \ 

.5676 .9691 

.4645 .7268 

.8691 .5511 

\ .5379 .4754 / 



The covariance matrix of the idiosyncratic components was obtained likewise: 
the off-diagonal components were set to zero, the diagonal components were 
simulated as iid uniform ~ U([.2, .4]) and then fixed as 

O = diag(.3213 .3726 .2646 .4169 .3257) 

The market factor time series was obtained as the mean over dimension of the 
simulated data. All simulations presented in this section were repeated for new 
realizations of {T, Cov(fi)} without any major changes in the results, which is 
why we will omit these repetitive studies. A length of T = 1, 024 was chosen for 
the time series in this section. 
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Fig 2: MISE of DDMSE, averaged periodogram, 1-factor-model and DDSSE 
for data from a 2 factor model. T = 1,024, smoothing span m = 19, different 
standard deviations of second factor. Based on M = 1,500 Monte Carlo runs. 
Confidence intervals are not printed as there is no intersection at 99% level for 
the solid curves. 



4-2. Influence of the true model 

The only formal prerequisite for the true model in order for the DDMSE to 
work is that its true spectrum is not that of a one-factor model such as the 
one specified in section 3.2. In this subsection, we will examine the influence of 
the 'distance' from a one-factor model. This is accomplished by using the two- 
factor model (38) to generate the data and systematically varying the standard 
deviation of the second, flat-spectrum factor. For small standard deviation, the 
data are very close to a one-factor model; as the standard deviation of the second 
factor increases, so docs its influence. The results are given in figure 2. The effects 
we observe in the simulations study confirm our assumptions on the respective 
behavior of averaged periodogram, one-factor model, DDSSE and DDMSE. First 
of all, we remark that the DDMSE performs best for all choices of the white noise 
variance in the simulations. The averaged periodogram, upon which we want to 
improve, exhibits the worst performance. Not only is it outperformed by the 
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Fig 3: MISE of DDMSE, averaged periodogram, 1-factor-model and DDSSE for 
data from a 2 factor model. T = 1,024, different smoothing spans. Based on 
M = 1, 500 Monte Carlo runs. 

DDSSE, which we would have expected based on the results of the preceding 
section, but also by the one-factor model. This shows that, in this context, 
the one-factor model is a useful model in itself, even although it is actually 
misspecified. It even outperforms the DDSSE for most choices of white noise 
variance. Overall, the MISE increases with the variance of the second factor, 
and the different estimators follow the MISE in a parallel shape. 

4-3. Influence of the smoothing span 

In the next Monte Carlo study, we have varied the smoothing span and examined 
its influence on the MISE. The results are given in figure 3. Not surprisingly, we 
observe that the overall MISE decreases as the sample size is increased for all 
three estimators. For small smoothing span, the averaged periodogram exhibits 
the worst performance. The DDSSE performs better than the averaged peri- 
odogram for small smoothing span, but is outperformed by the one-factor model 
and by the DDMSE. For the very small smoothing span m = 7, the DDMSE 
and the one-factor model have approximately the same MISE. Then, we have 



H. Bohm, R. von Sachs /Structural shrinkage of non-parametric spectral estimators 713 

again the ranking averaged periodogram-DDSSE-onc-factor modcl-DDMSE, as 
in the preceding subsection. Finally, for a comparatively large smoothing span 
of tyl = 31 or larger, the DDMSE, DDSSE and averaged periodogram seem to 
have approximately the same MISE. This is again not surprising, as for fixed 
dimension, both data driven estimators converge to the averaged periodogram. 
Moreover, for large smoothing span, the one-factor model performs worse than 
the averaged periodogram. This is, however, not due to a loss of performance of 
the one-factor model, which improves monotonously with m, but rather due to 
the faster improvement of the averaged periodogram in terms of MISE. Finally, 
the deterioration of the estimator based the one-factor model with respect to 
the averaged periodogram for large smoothing span docs not make the DDMSE 
perform worse than the averaged periodogram. This can be explained by the 
fact that, for large m, the shrinkage intensity becomes negligibly small. 

5. Conclusions 

Our work deals with the concept of shrinkage in the frequency domain of mul- 
tivariate time series. Similarly to our companion paper Bohm and von Sachs 
(2007), it uses a new, localized concept of shrinkage that allows for the devel- 
opment of estimators that simultaneously overcome the problem of numerical 
instability due to high dimensionality or collinearity and have lower quadratic 
risk. In contrast to the developments in the time domain of Lcdoit and Wolf 
(2003), in the frequency domain of nonparametric estimation of the spectral 
density matrix by smoothing the periodogram matrix, all considerations have 
to be undertaken with respect to the (locally) effectively available sample size, 
which is governed by the smoothing parameter (and not the sample size alone) . 
In Bohm and von Sachs (2007) asymptotic theory has been derived for the sit- 
uation of shrinkage towards a multiple of the identity matrix where both the 
dimensionality p = px and the smoothing span m = mj tend to infinity as 
the length of the time series T — ► oo. In this paper, we have contented our- 
selves to investigate the theoretical properties of our proposed estimator by 
classical asymptotics, noting that a transfer to the more complex situation of 
"Kolmogorov" or double asymptotics would be possible as well. However, with 
this work on structural shrinkage, we want to put emphasis onto a different 
aspect of shrinkage, perhaps driven by a more applied interest. Using the iden- 
tity matrix as a shrinkage target is reasonable if there is little or no knowledge 
about the underlying multidimensional structure of the data. However, in many 
situations, in particular in economic applications, it is more rewarding to in- 
corporate potentially available background knowledge on the underlying cross 
dimensional structure of the data into the shrinkage target. This opens up the 
way to designing 'custom made' shrinkage estimators that offer a new answer 
to problems of model choice. In a given setting where a class of parametric or 
semi-parametric estimators is eligible, and the order has to be chosen, instead of 
relying on criteria such as AIC or BIC, the minimum order model can be used 
as a target towards which to shrink. Instead of calling the method "shrinkage" 
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we might as well describe it as stretching: a too parsimonious model is fitted and 
the estimate is then refined by adding the pcriodogram as a stretching target 
that has low bias and high variance. 

In addition to showing that a MISE-optimal "oracle" shrinkage intensity can 
be consistenly estimated from the data, we have shown by our Monte Carlo 
simulations, even for small sample size, the large gain in terms of Li risk of our 
estimator, in a situation of disposing additional structure, over the following 
competitors: the classical averaged periodogram, the "shrinkage to identity" 
estimator of Bohm and von Sachs (2007) and an estimator based on a fully 
parametric factor model. Simulations not reported here also demonstrate that 
shrinkage can be applied to tapered data; as tapering improves the rate of the 
bias without changing the rate of consistency, it is easy to transfer this to theory. 
For similar reasons, it is possible to replace the averaging of the periodogram 
by kernel smoothing. 

An important field of application of our approach would be factor modelling 
of panels of economic time series data of comparatively high dimensionality. 
We recall that "high dimension" needs to be understood as high compared to 
the "effective sample size" Tot- Our achievements of this paper suggest that 
it could be possible to circumvent the problem of searching for an appropriate 
factor dimension - a problem still not satisfactorily solved in the literature, in 
particular for dynamic factor models. This latter application calls for a possible 
theoretical direction of future research: the generalization of our approach to a 
dynamic (and latent) factor model setting that allows for lag effects. 
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Appendix A: Proofs 

We will make frequent use of the abbreviations Cj, which means the Fourier 
frequency nearest ui, and ui^ := uj + u)^. 



A.l. Proofs of equation (21) and of theorem 3.3 

We begin by showing equation (21) which can be decomposed as follows: 



R{z) 



E 



zfe(v) + (1 - *)/tM - /tM 
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P 



^e|z^.h + (i-z)/«H-4h 

Var zft^w) + (1 - z)/P» - /0(c) 



£ |e(z4h + (i-z)/«H-/°(. 



i,j=l 



Var z^.( w ) + (1 - - 



£ (z 2 Var $ («) + (!- z) 2 Var /° (u,) 



+ z(l-z)Cov (^M,/°H) 
+ z(l-z)Cov (/«M,/i(c)) 

+ z 2 |4H-/°H| 2 ) 

= ^ (z 2 Var ft. H + (1 - z) 2 Var /° ( w ) 

+ 2z(1-z)5r(cov(/iH,/^h)) 
+ z 2 |/i.H-/°H| 2 ) 

Then wc want to derive the optimal shrinkage intensity £y(w), which is the so- 
lution of the optimization problem (20). According to (22), any local extremum 
of the function R(z) is a minimum. Thus, Cr( u; ) * s the value obtained for z by 
setting the first derivative equal to zero: 

o = #(&M) 

^ = 2^ {&( w )Var j* (w) - (1 - Var 
+ (1-2&( W ))B (Cov(#>),/°.(u;)} 

+ CtH|4H-/°H| 2 } 

«• 2& (w) 2 { Var ( w ) + Var /° (w) 
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23? 



cov (4h,/p>))) + |4H - /°»| 2 } 



= 2 J2 {Var/^.(c) - 23? (Cov - £°»))} 

^ 2£H £ {Var (/» - /° h) + |4H - /° H| 2 } 
= 2 £ {Var/° H - 23? (Cov - 



(Va*/>)-23?C< 






) 




{%{")-%{<>>)) 


+ 




»l 2 ) 



□ 



A. 2. Proof of theorem 3-4 

Theorem 3.4 is proven using two technical lemmata which we will give immedi- 
ately after the proof, which we give first: 
If we multiply (23) by tut, we obtain 

(Var (Vm?/§(w)) - 23? (Cov Un^f^u), y/ffij 



EL=i (Var(/^( W ) -f§(u)) + \fl 3 {Lo) - /° (a;)| 



(39) 



fij{w) and flj(uj) are consistent estimators of and respectively. 

This means that 



Var 



(&-H-$M) =o(l) (40) 

Using assumption 3.3 and (40), we obtain that the denominator of the right 
hand side of (39) is 0(1). The numerator of the right hand side of (39) is 

n(w) - 2$t(p(u)) + O (;t=) according to lemmata A.l and A.2. This yields 

mr£M = 7 , V 7 ( 41 ) 



or, equivalcntly 

TUT li^) 



<*(«,) = J_ ^M +A»M + o (T-V2) (42) 



□ 
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A. 2.1. Lemmata needed for A. 2 (proof of theorem 3.4) 
Lemma A.l. 



Var (y^/° M) - ^fa) + O 



(43) 



Proof. 



Var(V7^/°H) = Var 



1 



(m r -l)/2 



irriT 



ttit 



hfak) 

fc=-(m T -l)/2 
(mr-l)/2 

XI Var(/„-(w fc )) 



fc=-(m T -l)/2 
(m T -l)/2 

X Cov(Iy(wfc)Iy'(w;)) 



rriT 



fc,i=-(m T -l)/2 
k^l 



\fijfa)\ 2 + 



/TOT\ 



l/«Mr + o 



frriT 



1 



\ T J ttit 







r 



This proves equation (43) and yields that 

^yH = 1/uHI 2 

Lemma A. 2. For i 7^ 

cov (yW^M, V^fiji")) = (hM + o te) 



(44) 
□ 

(45) 



Proof. In the following estimate, we make use of (16), i.e. the convergence in 
probability of bi coming from equation (14), 

1 



In order to control the error in replacing the random bi by their limiting f3i we 
use Cauchy's inequality applied to all occuring remainder terms of the following 
or similar type 

Cov((6i - 0i)f3jI oo (Qk),Iij fal)) • 
With this we can derive that 



Cov 



W™rfi3 fa) , V™rftj (<- 



(mr-l)/2 



(m T -l)/2 



Cov 



bib j loofah), i"y(wfe 



fc=-(m T -l)/2 



fc=-(m T -l)/2 
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(m T -l)/2 



fc=-(m r -l)/2 
(m r -l)/2 

2 Cov (Zoo (wfc ) , hj (u>i))\+0 

fc,I=-(m T -l)/2 



(m T -l)/2 > . 

=— m Cov(7oo(C fc ),/«(^)) + o(-^)+o(-^) , (46) 

T fe=-(mr-l)/2 ^ ' 



where we have used that Cov ( Joo (&k ) 1 Zy (a;/)) = O (i) for fc ^ £ Showing this 
is parallel to treating the leading term, i.e. the covariance term in (46) using 
lemma A. 3 and lemma A. 4: 

Cov(Ioo(w),Iij(u)) 



Cov 



■ Cov (d (uj), di(u>)) Cov ( da((j), dj(w) 
+ Cov (do(cj), dj(u))J Cov (da(uj),di( 

= E (doH E (d^J) djiwj) + O (±\ O (~ 



=/ M M/ j0 M + O^ (47) 
Combining this with (46) and (47) yields thus 

Cov (V^?4H, VmF/g ;(")) = M-/oiH/io(o;) + O fe) (48) 
which proves (45) and at the same time yields 

= P t 0jfoi(u)fjo(w) ■ (49) 

□ 

A . 3. Proof of lemma 3. 5 

Proof. According to (Brockwell and Davis, 1987, theorem 10.3.2), we have 

Varly^fc) = |/y(w fe )| 2 + O (l/Vr) (50) 

and 

Cov(%(o) fc ) ) I ii (a) i )) = O (1/T) (51) 
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for < (lit 7^ d>i < ir. Furthermore, 
Op(l) for all Cbk in the span of rriT- 
This allows us to write 



E /^(wjO+O (m T /T)+O p (1/mr) 



p»i(w) 

(mr-l)/2 2 

= — E ^«-/» 

fc=-(ro T -l)/2 
K-l)/2 

= V" \Iij(oJk) - EJy(a)fc) + o p (l)| 2 

fc=-(m T -l)/2 
(m T -l)/2 

— E M^)-E^(£ fc )l 2 + ° P (i) , 



771T 



fc=-(m T -l)/2 



having used that \Iij(u>k) — EIij{uik)\ 2 is O p (1). It remains to show that 



(m T -l)/2 



- E 



rriT 



\hj{LOk)-^hj{u>k)Y 



fc=-(m-r-l)/2 



converges to |/y(u;)| 2 = 7r.y(tj) in probability. 

We observe that (50) allows to control convergence of the mean, whereas 
we can control the variance by borrowing strength from a proof of a CLT for 
Efe=-(m/-i)/2 Aj (^fc)' O ne technique, frequently used and to be found, e.g., 
in Brillingcr (1975), proof of Theorem 7.4.4., is to show that the cumulants 

(m T -l)/2 



of higher order than 2 of \fmr^^ Efe=- 

particulcr the cumulants of order 4. But this includes in particular that 



( mr -l)/2 J <jfe) tend t0 Zer0 ' Le ' in 



TTLT 



EE Cov (4( 



Wfc. 



fa)) 



which is what is needed here. (An explicit calculation of this covariance would 
also be possible by application of Brillingcr (1975), Theorem 4.3.1, using the 
fact that all but the second-order cumulants of order one up to eight of the 
occurring Gaussian mean zero di(u>) are identical zero, and that the second- 
order cumulants are products of expressions of the form Cov (d; (<!>&), dj (£>/)) 
which tend to zero for k ^ £.) □ 



A. 4- Proof of lemma 3.6 

Proof, bi, bj, /oj(w) and .fj (w) are consistent estimators of fy, f3j, fai(u) and 
/jo(w). This yields, in conjunction with (49), the result. □ 
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A. 5. Additional lemmata 

Lemma A. 3. Let {X\, A 2 , A3, X4) be a 4-variate normal random variable. Then 
we have 



where the null convergence is uniform in {101,102} G (0, 2n] x (0, 2tt]. 

Proof. The proof of this can be found in Shumway and Staffer (2000, p. 275ff). 
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